[Fix](rules) fix result wrong of PushDownAggThroughJoinOnPkFk#59498
[Fix](rules) fix result wrong of PushDownAggThroughJoinOnPkFk#59498morrySnow merged 7 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 33970 ms |
TPC-DS: Total hot run time: 173217 ms |
ClickBench: Total hot run time: 26.83 s |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
1 similar comment
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
ee0201a to
67d181b
Compare
|
run buildall |
TPC-H: Total hot run time: 31162 ms |
TPC-DS: Total hot run time: 173051 ms |
ClickBench: Total hot run time: 27.22 s |
FE Regression Coverage ReportIncrement line coverage |
3215de3 to
4da7874
Compare
|
run buildall |
TPC-H: Total hot run time: 32489 ms |
TPC-DS: Total hot run time: 173347 ms |
ClickBench: Total hot run time: 26.78 s |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
TPC-DS: Total hot run time: 172991 ms |
ClickBench: Total hot run time: 26.84 s |
7b1898b to
1c48d96
Compare
1c48d96 to
3fd07b2
Compare
|
run buildall |
TPC-H: Total hot run time: 32025 ms |
TPC-DS: Total hot run time: 173235 ms |
ClickBench: Total hot run time: 27.19 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run vault_p0 |
1 similar comment
|
run vault_p0 |
Related PR: #36035 Problem Summary: The key of the aggregation must include the primary key of the primary key table (or contain a unique key that can form a bijection with the primary key) to push the aggregation to the foreign key table. Before this pr, doris have wrong results in this situation: drop table if exists customer_test; drop table if exists store_sales_test; CREATE TABLE customer_test ( c_customer_sk INT not null , c_first_name VARCHAR(50), c_last_name VARCHAR(50) ); CREATE TABLE store_sales_test ( ss_customer_sk INT, ss_date DATE ); INSERT INTO customer_test VALUES (1, 'John', 'Smith'); INSERT INTO customer_test VALUES (2, 'John', 'Smith'); INSERT INTO store_sales_test VALUES (1, '2024-01-01'); INSERT INTO store_sales_test VALUES (2, '2024-01-01'); alter table customer_test add constraint c_pk primary key (c_customer_sk); alter table store_sales_test add constraint ss_c_fk foreign key (ss_customer_sk) references customer_test(c_customer_sk); show constraints from customer_test; show constraints from store_sales_test; SELECT DISTINCT c_last_name, c_first_name, ss_date FROM store_sales_test inner join customer_test on store_sales_test.ss_customer_sk = customer_test.c_customer_sk; set disable_nereids_rules='PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK'; set disable_nereids_rules=''; Turn on PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK will have different result with turn off PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK before this pr. This is because AGG (group by c_last_name, c_first_name, ss_date) should not be pushed down below the JOIN operation. The original transform was: Agg(group by c_last_name, c_first_name, ss_date ) +--Join(c_customer_sk=ss_customer_sk) +--scan(customer_test) +--scan(store_sales_test) -> Join +--scan(customer_test) +--Agg(group by ss_customer_sk,ss_date) +--scan(store_sales_test) This is an incorrect rewrite because it is not equivalent. This pr corrects the rewrite, allowing the aggregation to be pushed down below the join only when there is a bijective relationship between the group by key from the primary table and the fields in the foreign table (a functional dependency exists from a to b, and also from b to a, then a and b have a bijective relationship). For example, Agg(group by c_customer_sk, c_first_name, ss_date ) +--Join(c_customer_sk=ss_customer_sk) +--scan(customer_test) +--scan(store_sales_test) -> Join(c_customer_sk=ss_customer_sk) +--scan(customer_test) +--Agg(group by ss_customer_sk,ss_date) +--scan(store_sales_test) Since c_customer_sk is the primary key, c_first_name in the group by clause can be removed (based on functional dependencies). Furthermore, due to the equality relationship c_customer_sk = ss_customer_sk, there is a bijective relationship between c_customer_sk and ss_customer_sk. In this case, `group by c_customer_sk, ss_date` can be replaced with `group by ss_customer_sk, ss_date`. The aggregation group by key is entirely replaced with the output of the foreign table. Since a primary key-foreign key join does not expand the rows of the foreign table,In this situation, the aggregation can be pushed down.
…#59498) Related PR: apache#36035 Problem Summary: The key of the aggregation must include the primary key of the primary key table (or contain a unique key that can form a bijection with the primary key) to push the aggregation to the foreign key table. Before this pr, doris have wrong results in this situation: drop table if exists customer_test; drop table if exists store_sales_test; CREATE TABLE customer_test ( c_customer_sk INT not null , c_first_name VARCHAR(50), c_last_name VARCHAR(50) ); CREATE TABLE store_sales_test ( ss_customer_sk INT, ss_date DATE ); INSERT INTO customer_test VALUES (1, 'John', 'Smith'); INSERT INTO customer_test VALUES (2, 'John', 'Smith'); INSERT INTO store_sales_test VALUES (1, '2024-01-01'); INSERT INTO store_sales_test VALUES (2, '2024-01-01'); alter table customer_test add constraint c_pk primary key (c_customer_sk); alter table store_sales_test add constraint ss_c_fk foreign key (ss_customer_sk) references customer_test(c_customer_sk); show constraints from customer_test; show constraints from store_sales_test; SELECT DISTINCT c_last_name, c_first_name, ss_date FROM store_sales_test inner join customer_test on store_sales_test.ss_customer_sk = customer_test.c_customer_sk; set disable_nereids_rules='PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK'; set disable_nereids_rules=''; Turn on PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK will have different result with turn off PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK before this pr. This is because AGG (group by c_last_name, c_first_name, ss_date) should not be pushed down below the JOIN operation. The original transform was: Agg(group by c_last_name, c_first_name, ss_date ) +--Join(c_customer_sk=ss_customer_sk) +--scan(customer_test) +--scan(store_sales_test) -> Join +--scan(customer_test) +--Agg(group by ss_customer_sk,ss_date) +--scan(store_sales_test) This is an incorrect rewrite because it is not equivalent. This pr corrects the rewrite, allowing the aggregation to be pushed down below the join only when there is a bijective relationship between the group by key from the primary table and the fields in the foreign table (a functional dependency exists from a to b, and also from b to a, then a and b have a bijective relationship). For example, Agg(group by c_customer_sk, c_first_name, ss_date ) +--Join(c_customer_sk=ss_customer_sk) +--scan(customer_test) +--scan(store_sales_test) -> Join(c_customer_sk=ss_customer_sk) +--scan(customer_test) +--Agg(group by ss_customer_sk,ss_date) +--scan(store_sales_test) Since c_customer_sk is the primary key, c_first_name in the group by clause can be removed (based on functional dependencies). Furthermore, due to the equality relationship c_customer_sk = ss_customer_sk, there is a bijective relationship between c_customer_sk and ss_customer_sk. In this case, `group by c_customer_sk, ss_date` can be replaced with `group by ss_customer_sk, ss_date`. The aggregation group by key is entirely replaced with the output of the foreign table. Since a primary key-foreign key join does not expand the rows of the foreign table,In this situation, the aggregation can be pushed down.
…PkFk apache#59498 (apache#59703) Cherry-picked from apache#59498 Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #36035
Problem Summary:
The key of the aggregation must include the primary key of the primary key table (or contain a unique key that can form a bijection with the primary key) to push the aggregation to the foreign key table.
Before this pr, doris have wrong results in this situation:
set disable_nereids_rules='PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK';
set disable_nereids_rules='';
Turn on PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK will have different result with turn off PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK before this pr.
This is because AGG (group by c_last_name, c_first_name, ss_date) should not be pushed down below the JOIN operation.
The original transform was:
This is an incorrect rewrite because it is not equivalent.
This pr corrects the rewrite, allowing the aggregation to be pushed down below the join only when there is a bijective relationship between the group by key from the primary table and the fields in the foreign table (a functional dependency exists from a to b, and also from b to a, then a and b have a bijective relationship).
For example,
Since c_customer_sk is the primary key, c_first_name in the group by clause can be removed (based on functional dependencies).
Furthermore, due to the equality relationship c_customer_sk = ss_customer_sk, there is a bijective relationship between c_customer_sk and ss_customer_sk. In this case,
group by c_customer_sk, ss_datecan be replaced withgroup by ss_customer_sk, ss_date.The aggregation group by key is entirely replaced with the output of the foreign table. Since a primary key-foreign key join does not expand the rows of the foreign table,In this situation, the aggregation can be pushed down.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)